MCAT Math Retrieval System for NTCIR-12 MathIR Task
نویسندگان
چکیده
This paper describes the participation of our MCAT search system in the NTCIR-12 MathIR Task. We introduce three granularity levels of textual information, new approach for generating dependency graph of math expressions, score normalization, cold-start weights, and unification. We find that these modules, except the cold-start weights, have a very good impact on the search performance of our system. The use of dependency graph significantly improves precision of our system, i.e., up to 24.52% and 104.20% relative improvements in the Main and Simto subtasks of the arXiv task, respectively. In addition, the implementation of unification delivers up to 2.90% and 57.14% precision improvements in the Main and Simto subtasks, respectively. Overall, our best submission achieves P@5 of 0.5448 in the Main subtask and 0.5500 in the Simto subtask. In the Wikipedia task, our system also performs well at the MathWikiFormula subtask. At the MathWiki subtask, however, due to a problem with handling queries formed as questions that contain many stop words, our system finishes second.
منابع مشابه
Math Indexer and Searcher under the Hood: Fine-tuning Query Expansion and Unification Strategies
This paper summarizes the experience of Math Information Retrieval team of Masaryk University (MIRMU) with the NTCIR-12 MathIR arXiv Main Task and its subtasks. We based our approach on the MIaS system. Based on NTCIR-11 Math-2 Task relevance judgements, we developed an evaluation platform. Using this platform we rigorously evaluated combinations of new features and picked the most promising on...
متن کاملTangent-3 at the NTCIR-12 MathIR Task
We present the math-aware search engine Tangent-3 and report its results for the NTCIR-12 MathIR task. Tangent uses a federated search over two indices: 1) a TF-IDF textual search engine (Solr), and 2) a query-by-expression engine. We use an inverted index to store math expressions using pairs of symbols extracted from a Symbol Layout Tree representation built from Presentation MathML. We use a...
متن کاملExploring the One-brain Barrier: A Manual Contribution to the NTCIR-12 MathIR Task
This paper compares the search capabilities of a single human brain supported by the text search built into Wikipedia with state-of-the-art math search systems. To achieve this, we compare results of manual Wikipedia searches with the aggregated and assessed results of all systems participating in the NTCIR-12 MathIR Wikipedia Task. For 26 of the 30 topics, the average relevance score of our ma...
متن کاملThe Math Retrieval System of ICST for NTCIR-12 MathIR Task
This paper is the summarized experiences of ICST team in the NTCIR-12 MathIR main tasks (ArXiv and Wikipedia main task). Our approach is based on keyword, structure and importance of formulae in a document. A novel hybrid indexing and matching model is proposed to support exact and fuzzing matching. In this hybrid model, both keyword and structure information of formulae are taken into consider...
متن کاملA Document Retrieval System for Math Queries
We present and analyze the results of our Math search system in the MathIR tasks in the NTCIR-12 Information Retrieval challenge. The Math search engine in the paper utilizes the co-occurrence finding technique of LDA and doc2vec to bring more contextual search. Additionally, it uses common patterns to improve the search output. To combine various scoring algorithms, it uses hybrid ranking mech...
متن کامل